Advertisement: Support JavaWorld, click here!
April 1999
HOME FEATURED TUTORIALS COLUMNS NEWS & REVIEWS FORUM JW RESOURCES ABOUT JW






ARCHIVE

TOPICAL INDEX
Core Java
Enterprise Java
Micro Java
Applied Java
Java Community

JAVA Q&A INDEX

JAVA TIPS INDEX

JavaWorld Services

Free JavaWorld newsletters

ProductFinder

Education Resources

White Paper Library

NEW! Rational Resources


XML for the absolute beginner

A guided tour from HTML to processing XML with Java


Printer-friendly version Printer-friendly version | Send this article to a friend Mail this to a friend


Page 10 of 10

Advertisement

Become a tree surgeon!
One final, somewhat more advanced topic, before we close. The SAX interface allows you to parse an XML file and execute particular actions whenever certain structures (like tags) appear in the input. That's great for a lot of applications. There are times, though, when you want to be able to cut and paste whole sections of XML documents, restructure them, or maybe even build from scratch an object structure like the one in Figure 3, and then save the whole structure as an XML file. For that, you need access to the DOM API.

The DOM API allows you to represent your XML document as a tree of nodes in your Java (or other language) program. While a SAX parser reads an XML file, doing callbacks to a user-defined class, a DOM parser reads an XML file and returns a representation of the file as a tree of objects, most of which are of type org.w3c.dom.Node This gives you immense power in manipulating structured documents. Figure 4 is an example of what I'm talking about.


Figure 4. A DOM document transformation system

The Document Object Model, in the package org.w3c.dom, defines interfaces for document elements (that is, tags), DTD elements, text nodes (where the actual text inside the tags is kept), and quite a few other things we haven't even discussed. Figure 4 is a schematic of a general system that can transform one XML document to some other form programmatically. Your program uses a DOM parser to parse an XML file, and the parser returns a tree that is an exact representation of the XML in the file. Note that, at this point, you've read an input file, checked it for formatting and semantic validity, and built a complex hierarchical object structure, all in just a few lines of code. You can then traverse the document tree in software, doing whatever you like to the tree structure. Add nodes, delete them, update their values, read or set their attributes -- basically anything you like. When your tree has the new structure you desire, tell the top node to print itself to another XML file, and the new document is created.

XML-Java synergy
One of the reasons Java and XML are so well-suited for one another is that Java and XML are both extensible: Java through its class loaders, XML through its DTD. Imagine a server, reading and writing XML, where the DTD for the system input can change. When a new element is added to the input language, a running server (written in Java) could automatically load new Java classes to handle the new tags. You would not only have an extensible application server -- you wouldn't even have to take the server down to add the extensions!

One small idea points to the possible implementations of XML and Java together. The next section is about a company whose combination of XML and Java is its core technology.

XML with Java in the real world
You now have a handle on XML technology, including how it's implemented in Java. You understand that a document can be viewed as a tree of objects and manipulated using SAX or DOM. Let's have a look at a real company that is using all of these technologies to provide solutions for its clients.

DOM interfaces exist not only for XML, but for HTML, as well. This means that the leftmost document in Figure 4 could be a Web page from which you wish to extract information for manipulation in Java.

In fact, Epicentric, an Internet startup in San Francisco, does just that. Epicentric uses Java and XML in its turnkey systems to allow creation of custom portal sites. Portal sites, like the front pages of Netscape Netcenter and Excite!, are integrated aggregations of information from various Internet sources. In a corporate Internet environment, a portal may contain information gleaned from external Web pages (for example, weather reports), alongside internal enterprise data. Portals are also often customizable by each user.

Epicentric's systems read HTML from the Internet as DOM documents, extract information from those documents, and store that information in a standard XML format. Other information sources are also converted into this same XML format and stored on Epicentric's server. The company then uses the XML with XSL and Java Server Pages to create custom portals for its clients.

"A lot of good work has been done on the basics ... like parsers and XSL processors," says Ed Anuff, CEO of Epicentric. One benefit of using XML is that it makes designers think through the system structure in a very structured way, Anuff says.

When asked about concerns with XML, Anuff states that many of the problems he runs into are architectural, such as which DTD to use, and designating the appropriate places in the system to use XML. Systems designers are still working out how to use this new technology most effectively in an enterprise environment.

Also, since the technology is so new, it's often hard to know what pieces of the system to build in-house. For example, quite a few companies built their own XML parsers but now have little return on investment because larger companies are developing superior XML technology and giving it away for free. "The biggest challenge today is figuring out when you're reinventing the wheel, and when you're adding value," says Anuff.

Despite these challenges, the future looks bright for Epicentric, which has several "pretty decent-sized customers" using the company's software in beta. With clients and advertisers that include the likes of Eastman Kodak Company, Sun Microsystems, Chase Bank, and LIFE Magazine, Epicentric is using XML to aggregate and redistribute information in novel ways.

Conclusion
XML is a powerful data representation technology for which Java is uniquely well-suited. You're going to be hearing a lot about XML in the coming months and years. Anyone working with information systems that communicate with other systems (and what systems don't, these days?) has a lot to gain by understanding XML technology and using it to its full advantage.

Using XML with XSL or CSS, you can manage your Web site's content and style, and change style in one place (the style sheet) instead of editing piles of HTML files or, worse, editing the scripts that produce HTML dynamically. Using SAX or DOM, you can treat Web documents as object structures and process them in a general and clean way. Or, you can leave browsers behind entirely and write pure-Java clients and servers that talk to each other -- and other systems -- in XML, the new lingua franca of the Internet. Sun Microsystems, the creator of Java, has perhaps best described the power of XML and Java together in its slogan: Portable Code -- Portable Data. Start experimenting with XML in Java, and you'll soon wonder how you ever lived without it.

Thanks to Dave Orchard for his comments on drafts of this article, and to the many helpful people I met in San Jose, CA.


Page 1 XML for the absolute beginner
Page 2 HTML: All form and no substance
Page 3 An XML conceptual example
Page 4 Make up a markup
Page 5 So, what good is made-up markup?
Page 6 Cascading Style Sheets: not just for HTML anymore
Page 7 XSL: I like your style
Page 8 Modeling information structure in XML
Page 9 XML and Java
Page 10 Become a tree surgeon!

Printer-friendly version Printer-friendly version | Send this article to a friend Mail this to a friend

About the author
Mark Johnson lives in Fort Collins, CO, and is a C++ programmer by day and Java columnist by night. Very late night.


Advertisement: Support JavaWorld, click here!


HOME |  FEATURED TUTORIALS |  COLUMNS |  NEWS & REVIEWS |  FORUM |  JW RESOURCES |  ABOUT JW |  FEEDBACK

Copyright © 2003 JavaWorld.com, an IDG company